Soft mask estimation for single channel speaker separation

نویسندگان

  • Aarthi M. Reddy
  • Bhiksha Raj
چکیده

The problem of single channel speaker separation, attempts to extract a speech signal uttered by the speaker of interest from a signal containing a mixture of auditory signals. Most algorithms that deal with this problem, are based on masking, where reliable components from the mixed signal spectrogram are inversed to obtain the speech signal from speaker of interest. As of now, most techniques, estimate this mask in a binary fashion, resulting in a hard mask. We present a technique to estimate a soft mask that weights the frequency sub-bands of the mixed signal. The speech signal can then be reconstructed from the estimated power spectrum of the speaker of interest. Experimental results shown in this paper, prove that the results are better than those obtained by estimating the hard mask.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using audio and visual information for single channel speaker separation

This work proposes a method to exploit both audio and visual speech information to extract a target speaker from a mixture of competing speakers. The work begins by taking an effective audio-only method of speaker separation, namely the soft mask method, and modifying its operation to allow visual speech information to improve the separation process. The audio input is taken from a single chann...

متن کامل

Smooth soft mel-spectrographic masks based on blind sparse source separation

This paper investigates the use of DUET, a recently proposed blind source separation method, as front-end for missing data speech recognition. Based on the attenuation and delay estimation in stereo signals soft time-frequency masks are designed to extract a target speaker from a mixture containing multiple speech sources. A postprocessing step is introduced in order to remove isolated mask poi...

متن کامل

Speaker separation using visually-derived binary masks

This paper is concerned with the problem of single-channel speaker separation and exploits visual speech information to aid the separation process. Audio from a mixture of speakers is received from a single microphone and to supplement this, video from each speaker in the mixture is also captured. The visual features are used to create a time-frequency binary mask that identifies regions where ...

متن کامل

Deep Clustering-Based Beamforming for Separation with Unknown Number of Sources

This paper extends a deep clustering algorithm for use with time-frequency masking-based beamforming and perform separation with an unknown number of sources. Deep clustering is a recently proposed single-channel source separation algorithm, which projects inputs into the embedding space and performs clustering in the embedding domain. In deep clustering, bi-directional long short-term memory (...

متن کامل

Combining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks

Deep neural networks (DNNs) are usually used for single channel source separation to predict either soft or binary time frequency masks. The masks are used to separate the sources from the mixed signal. Binary masks produce separated sources with more distortion and less interference than soft masks. In this paper, we propose to use another DNN to combine the estimates of binary and soft masks ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004